US 2020 Presidential Election Predictions

This is the only site with all 2020 US Election Predictions

 

 

 

Analytical Equation Models

At present, there are 5 prediction models developed using an analytical approach:

  • State-by-state model by Mark Zandi, Moody’s Analytics’s chief economist, and other analysts.
  • National model of popular vote by Professor Ray Fair of Yale University Prediction Model.
  • Electoral vote model by Donald Luskin, Chief Investment Officer at Trend Macrolytics, LLC.
  • State-by-state model by Skyler Dale, contributor to the Medium.com website
  • State-by-state model by Vinod Bakthavachalan (Vinod B.), a data science analyst and also a contributor to Medium.com
  • The following serves as brief introduction to these models as they are on March 2020, rather a detailed review. None of the models have been updated since the Covid-19 crisis. In this highly uncertain environment, attempting election predictions may become difficult or impossible until the final few months. Therefore, the focus at this point should be on methodology rather than results.

    I also draw on the work by Professor Allan Lichtman, the author of 13 Keys to the White House, who does not rely on analytical models, but rather identifies 13 critical keys, with similarities to the variables used in the analytical models. He has shown remarkable success in election prediction through simply a rational examination of political and economic variables.

    Dr. Ray Fair, a Yale economist, laid the basis for this approach in his article in 1978 and numerous papers spanning four decades. His website page provides detailed documentation of his current model as reference at the end of this discussion. Both Dr. Fair and Skyler Dale use a stochastic simulation model to provide the chance one candidate can win the election. The probability of the opposing candidate winning can be done by subtraction. In some of his models, the independent variables in his regression equation are considered random variables, each with their own distribution. It is necessary to extend the economic variables out in time, to evaluate their values at the time of the election.

    I use the term "Analytical Equation Models" as I see equations as the key feature in these models. I note Vinod B. uses the term "fundamentals based models" in contrast to poll survey based models. The terminology really is unimportant. Using equations or fundamentals, the election can be modeled more than 12 months in advance of the election. In fact, Dr. Fair began his predictions in 2018 for the 2020 election. The state ranking methods began making forecasts in 2019, but will need to be updated after the Democratic primary.

    General Approach

    All models use the general linear form: Y = a0 + aX1 + bX2 + cX3 ... where Y is the dependent variable and X1, X2, X3 are the independent variables and a0, a, b and c are coefficients as determined by least squares fit, to minimize error between the observed and calculated y values. The forecasts are not strictly mathematical, as the selection of the independent variables is a matter of judgement of the analyst. Each of the five models listed above is unique in this aspect.

    The outcome the public would like to know is just who will be the winner, a binary choice. Yet, this is not a good variable for error minimization. Dr. Fair's model and others models use the popular vote share for one party as the dependent variable. However a candidate can win the electoral vote and fail to win the popular vote. This happened five times in the 58 elections (Donald Trump, 2016, George Bush, 2000, (John Benjamin Harris, 1888, Rutherford Hayes, 1876 and Quincy Adams, 1824) or in 5 of the last 58 elections.

    The popular vote share creates a bit of dilemna in identifying the track record of predictions. Allan Litchman claims to have predicted correctly the 2016 election, because he predicted Clinton would win the popular vote, which she did. But, he also claims to have predicted the 2000 election correctly, with George Bush, who did not win the popular vote. So, he can't have it both ways. So, in this weird way, when the popular vote and electoral vote winners are different, it opens the door to some ambiguity to what was being predicted and it sure looks like Professor Litchman is using it to his advantage.

    The dependent variable can also be the electoral votes which one party is expected to win. Assuming a two party election, the second party is calculated by subtraction. This may seem to be a better choice than the popular vote. However, it can more difficult to model, as a very small fraction of the electorate will have a decisive outcome in the "swing states." Politicians in both parties are increasingly aware of this aspect, and through social media and more traditional means, will target the key voters in these states, to mobilize their base. The more models project a sure win for one candidate, the more they encourage that party's supporters to become complacent and not vote. Thus this feedback mechanism reduces the chances of a successful prediction.

    A combination of both popular vote and electoral vote is possible, if done on a state-by-state basis. The voter share of one party for each party is done by the regresssion model, then the electoral votes are tallied to determine the winner. In fact, it is possible to provide estimates of the mean voter share and variance (standard error) of one party, and through stochastic simulation (Monte-Carlo simulation) of EV's, obtain estimates of the chance that one party will win the election. This is done in the discussion from Dr. Ray Fair and Vinod Vinod Bakthavachalan. It is similar to my simulation program to calculate the win probabilities for the state ranking models.

    Moody's models (there are three models with different sets of variables) predict the US election, state-by-state. Their dependent variable is the popular vote percentage in each state from which the expected national electoral vote is calculated. Their article provides a tabulation of expected blue and red states, based on each model and assumption within each model as provided at the end of this discussion. Those states with a fairly even mix of blue and red predictions correspond quite well with the lean, tilt, and toss-up states from the state ranking maps.

    As part of the independent variables, models may include a "dummy variable" with values of 0 or 1, to indicate whether a candidate is running for re-election. As pointed out by Don Luskin's video (see links below) incumbent candidates have a higher chance of being re-elected. Since 1976, we've had two one-term presidents (Carter and Bush, G.W) and four two-term presidents (Reagan, Clinton, Bush, and Obama). So, Luskin's approach is to identify those negative effects which denied Carter and Bush, G.W. re-election.

    Election modeling is one discipline where more data and more variables should not mean a better prediction. It may sound good when forecasters claim to have models which correctly predicted every election since 1952, for example, but it raises question of relevancy as the electorate, parties and issues are very different. Also, these claims are often made by adding variables and adjusting coefficients through a least squares fit, so the model would correctly predict prior elections. They may be better at explaining what happened in the past rather than in the future.

    Model building really involves selecting what is relevant in the upcoming election. A key element is the data time span, or how far back in time should be is meaningful as it would improve the model. It is common area between the state ranking and the analytical models in the selection of relavent data. It is definitely a judgement call of the analysts. The 270towin.com website maps, as linked to in this website, include the historical voting patterns going back to year 2000, highlighting the solid red and blue states. Is year 2000 to 2016 with five elections sufficient data? Could going back further, weaken the historical record, as the elections become less relevant? Moody's model forecast used all election results from all states beginning in 1982 or 510 data values for each variable.

    Prior Election Results as a Political Variable

    Moody's explains this variable in their models as follows:

    "To capture the political realities of each state, all three models rely heavily on the share of the overall vote that the current incumbent party received in a given state during the prior presidential election. This is the most significant variable in the model and single-handedly decides the fate of most states. It is the variable that ensures Texas almost always shows up red, and California is almost always blue. For the remaining states, where the outcome cannot be largely explained by party allegiance alone, three other political variables come into play."

    Of the three political variables mentioned, only approval ratings variable has an impact on the 2020 forecast. Moody's found the more meaningful variable is not the approval rating itself, but the change in approval rating while president. The change in Trump's approval rating has been much lower than other presidents, hence this variable did not detract much from the overall result (popular vote by state). The two other political variables (fatigue and democratic incumbent) were used to improve the model results in prior elections.

    The idea of highly similar voting patterns is supported by Skyler Dale's work, as he found a median absolute change in voter share to be 3.7%. So, there is a very low probability that a state which one particular party has 60% of the vote, might be switch parties in the following election. He concludes, "We can typically get a good understanding of the states that will be close in the next general election by simply looking at the prior one."

    Economic Variables

    Each of the analytical models contain economic variables. Dr. Ray Fair's equation contains 3 econimic variables:

    Democrats share of popular vote = PV = 45.6 - aG +bP - cZ with coefficients a,b and c all positive and less than 1.

  • G = Growth rate of real GDP per capita in the first 3 quarters of 2020
  • P = Growth rate of the GDP deflator in the first 15 quarters of the Trump administration, 2017:1 to 2020:3, annual rate.
  • Z = Number of quarters in the first 15 quarters of the Trump administration in which the growth rate of real per capita GDP is greater than the 3.2% at an annual rate. In the most current case, Z = 0.
  • Since no data are available for the first 3 quarters of 2020, the Fair model uses a second model (econometric stochastic simulation model) to estimate these economomic values. The GDP deflator is used as an indicator of inflation, thus a high value would add to the Democrat's popular vote share.

    Moody's constructed 3 models, which are termed the "Pocketbook", "Stock Market" and "Unemployment Model." Each has slightly different set of variables, but the one variable in all 3 models is the real income per household, 2-yr, % change. Other variables are the US gas prices, nominal house prices - 2 year percent change, and the S & P 500 average - 1 year percent change and the unemployment rate, 2-quarters ppt change.

    As published in September 2019, each of these 3 models predicted a Trump victory, based on a state-by-state evaluation. The expected electoral votes based on the pocketbook, stock market and unemployment model was 351, 289 and 332 EV's.

    Don Luskin's model prediction of a strong Trump win, was published in March 2019. The model included six variables: real GDP, CPI inflation, oil price, disposible income, payrolls, and personal tax burden. Conditions have suddenly worsen in March 2019 with the horrific consequences of coronavirus. Don Luskin wrote in March 2019, "But a GOP candidate could not survive even a mild recession, because it would come on the heels of today’s very hot economy."

    Similarly Moody's states "Under a moderate recession scenario, in which U.S. real GDP declines cumulatively by more than 2% over the next year, the average of our three models would point to a Democratic victory."

    Other variables and considerations

    Variables to capture more subtle effects noted Luskin and Moody's analyses, may not be meaningful in 2020 predictions, given the unique national focus on the coronavirus crisis. Moody's incorporates a "fatigue" variable, which represents the electorate's desire to change parties after two consecutive terms. It does not apply to 2020 election.

    The incumbent's party winning in the midterm election is considered a positive indicator variable. The Republicans lost control of the House, so this variable does not add to Trump's expected EV. Skyler Dale states, "There does appear to be a mild relationship between the two variables. Party performance in the congressional races is associated with performance in the general election. When added to the model, the feature provides modest but clear gains in predictive power."

    Allan Lichtman's 13 Keys to the White House, introduces other variables he found to be the essential keys. Note , he is not using regression analysis. He includes "charisma" that can add 2 keys to the necessary 7 keys to win. One point is gained by a charismatic incumbent and a second point is gained by a non-charismatic challenger. Lichtman has updated his model on August 4, 2020 and says Trump will lose, as he has 7 false keys. The impeachment based on the Ukraine scandal resulted on the "no scandals" key to turn false.

    Summary

    There is a common ground between the State Ranking and the analytical equation approaches, when done on a state-by-state basis. Both Moody's model and the work by Skyler Dale, summarize which states are likely to be solid blue or red, and the others where there is much less clarity. And as is generally agreed, it is these 8 to 9 challenging swing states, which will be the final arbiters in the 2020 election.

    Forecasts made far in advance of an election depend on a continuity of trends, not a global crisis as posed by the coronavirus. The forecasters can all claim a high success ratio in prior elections, but is this enough? After all, their models have been trained or tuned with data to a very different set of circumstances. So, the question I believe that is really on the minds of analysts in this worldwide crisis is, "Is 2020 election forecastable by any means?" .

    I think Hamming got it right many years ago, with his famous quote: "The purpose of computing is insight, not numbers."

    David Lord

    April 4, 2020

    Links:

    Moody's Analytics - 2020 US Presidential Model

    Don Luskin Model, Trends Macrolytics, Chief Investment Officer

    Professor Ray Fair webpage, Yale University economist, contains very detailed documentation on current and prior predictions

    Presidential and Congressional Vote-Share Equations: November 2018 Update Ray C. Fair (pdf file)

    Skyler Dale Model Documentation, US Presidential Model documentation, Medium.com

    Vinod Bakthavachalam Model Documentation, Medium.com